corollary 1
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > France (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (18 more...)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Reviewer # 1
We thank the reviewers for their positive feedbacks and valuable suggestions. We address their comments below. In short, "pruning-at-initialization" methods have the advantage of less overhead at training time. In contrast, our approach can be adopted even when the model is trained without any consideration of pruning. However, we think providing a self-sufficient proof in the current form is not a bad idea, either.
Understanding the Gain from Data Filtering in Multimodal Contrastive Learning
Pareek, Divyansh, Oh, Sewoong, Du, Simon S.
The success of modern multimodal representation learning relies on internet-scale datasets. Due to the low quality of a large fraction of raw web data, data curation has become a critical step in the training pipeline. Filtering using a trained model (i.e., teacher-based filtering) has emerged as a successful solution, leveraging a pre-trained model to compute quality scores. To explain the empirical success of teacher-based filtering, we characterize the performance of filtered contrastive learning under the standard bimodal data generation model. Denoting $η\in(0,1]$ as the fraction of data with correctly matched modalities among $n$ paired samples, we utilize a linear contrastive learning setup to show a provable benefit of data filtering: $(i)$ the error without filtering is upper and lower bounded by $\frac{1}{η\sqrt{n}}$, and $(ii)$ the error with teacher-based filtering is upper bounded by $\frac{1}{\sqrt{ηn}}$ in the large $η$ regime, and by $\frac{1}{\sqrt{n}}$ in the small $η$ regime.
- North America > United States > Washington > King County > Seattle (0.14)
- Asia > Middle East > Jordan (0.04)
Symmetric Linear Dynamical Systems are Learnable from Few Observations
Vu, Minh, Lokhov, Andrey Y., Vuffray, Marc
We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce and analyze a new estimator that achieves a small maximum element-wise error on the recovery of symmetric dynamic matrices using only $T=\mathcal{O}(\log N)$ observations, irrespective of whether the matrix is sparse or dense. This estimator is based on the method of moments and does not rely on problem-specific regularization. This is especially important for applications such as structure discovery.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)